Fix #8926: ListSerializer supports instance access during validation for many=True#9879
Fix #8926: ListSerializer supports instance access during validation for many=True#9879zainnadeem786 wants to merge 12 commits intoencode:mainfrom
Conversation
d42540b to
c205e9f
Compare
…ng validation and passes all tests
c205e9f to
f0375ca
Compare
There was a problem hiding this comment.
Pull request overview
This PR addresses issue #8926 by implementing automated instance matching in ListSerializer for bulk validation operations with many=True. The core enhancement allows child serializers to access their corresponding instance during validation by automatically matching input data to instances using id or pk fields.
Changes:
- Automated instance-to-data matching in
ListSerializer.run_child_validationusing a pk-based lookup map - Enhanced error handling with consistent
ErrorDetailwrapping for validation errors - Updated test suite with new regression test and corrected assertions for validation behavior
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 15 comments.
| File | Description |
|---|---|
| rest_framework/serializers.py | Core changes to ListSerializer adding automated instance matching, improved error handling, and validation flow updates |
| tests/test_serializer_lists.py | Updated existing tests for consistency and added regression test for issue #8926 |
| .gitignore | Added venv/ to ignored paths |
Comments suppressed due to low confidence (1)
rest_framework/serializers.py:729
- Several docstrings and inline comments were removed (e.g., for get_value, run_validation, to_internal_value, to_representation methods). While the code may be self-documenting to some extent, these comments provided useful context about the purpose and behavior of these methods. Consider keeping at least the docstrings for public methods to maintain API documentation quality, especially since this is a framework used by many developers.
def get_value(self, dictionary):
if html.is_html_input(dictionary):
return html.parse_html_list(dictionary, prefix=self.field_name, default=empty)
return dictionary.get(self.field_name, empty)
def run_validation(self, data=empty):
is_empty_value, data = self.validate_empty_values(data)
if is_empty_value:
return data
value = self.to_internal_value(data)
try:
self.run_validators(value)
value = self.validate(value)
assert value is not None, '.validate() should return the validated data'
except (ValidationError, DjangoValidationError) as exc:
raise ValidationError(detail=as_serializer_error(exc))
return value
def run_child_validation(self, data):
child = copy.deepcopy(self.child)
if getattr(self, 'partial', False) or getattr(self.root, 'partial', False):
child.partial = True
# Field.__deepcopy__ re-instantiates the field, wiping any state.
# If the subclass set an instance or initial_data on self.child,
# we manually restore them to the deepcopied child.
child_instance = getattr(self.child, 'instance', None)
if child_instance is not None and child_instance is not self.instance:
child.instance = child_instance
elif hasattr(self, '_instance_map') and isinstance(data, dict):
# Automated instance matching (#8926)
data_pk = data.get('id') or data.get('pk')
if data_pk is not None:
child.instance = self._instance_map.get(str(data_pk))
else:
child.instance = None
else:
child.instance = None
child_initial_data = getattr(self.child, 'initial_data', empty)
if child_initial_data is not empty:
child.initial_data = child_initial_data
else:
# Set initial_data for item-level validation if not already set.
child.initial_data = data
validated = child.run_validation(data)
return validated
def to_internal_value(self, data):
if html.is_html_input(data):
data = html.parse_html_list(data, default=[])
if not isinstance(data, list):
raise ValidationError({
api_settings.NON_FIELD_ERRORS_KEY: [
self.error_messages['not_a_list'].format(input_type=type(data).__name__)
]
})
if not self.allow_empty and len(data) == 0:
raise ValidationError({
api_settings.NON_FIELD_ERRORS_KEY: [ErrorDetail(self.error_messages['empty'], code='empty')]
})
if self.max_length is not None and len(data) > self.max_length:
raise ValidationError({
api_settings.NON_FIELD_ERRORS_KEY: [ErrorDetail(self.error_messages['max_length'].format(max_length=self.max_length), code='max_length')]
})
if self.min_length is not None and len(data) < self.min_length:
raise ValidationError({
api_settings.NON_FIELD_ERRORS_KEY: [ErrorDetail(self.error_messages['min_length'].format(min_length=self.min_length), code='min_length')]
})
# Build a primary key mapping for instance updates (#8926)
instance_map = {}
if self.instance is not None:
if isinstance(self.instance, Mapping):
instance_map = {str(k): v for k, v in self.instance.items()}
elif hasattr(self.instance, '__iter__'):
for obj in self.instance:
pk = getattr(obj, 'pk', getattr(obj, 'id', None))
if pk is not None:
instance_map[str(pk)] = obj
self._instance_map = instance_map
try:
ret = []
errors = []
for item in data:
try:
validated = self.run_child_validation(item)
except ValidationError as exc:
errors.append(exc.detail)
else:
ret.append(validated)
errors.append({})
if any(errors):
raise ValidationError(errors)
return ret
finally:
delattr(self, '_instance_map')
def to_representation(self, data):
# Dealing with nested relationships, data can be a Manager,
# so, first get a queryset from the Manager if needed.
# We avoid .all() on QuerySets to preserve Issue #2704 behavior.
iterable = data.all() if isinstance(data, models.manager.BaseManager) else data
return [
self.child.to_representation(item) for item in iterable
]
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
rest_framework/serializers.py
Outdated
| elif hasattr(self.instance, '__iter__'): | ||
| for obj in self.instance: | ||
| pk = getattr(obj, 'pk', getattr(obj, 'id', None)) | ||
| if pk is not None: | ||
| instance_map[str(pk)] = obj |
There was a problem hiding this comment.
The instance mapping logic checks hasattr(self.instance, '__iter__') to handle iterables. However, strings are also iterable in Python, which could cause unexpected behavior if a string is accidentally passed as instance. Consider adding an additional check to ensure self.instance is not a string, or explicitly check for list/tuple/QuerySet types.
| def save(self, **kwargs): | ||
| """ | ||
| Save and return a list of object instances. | ||
| """ | ||
| # Guard against incorrect use of `serializer.save(commit=False)` | ||
| assert 'commit' not in kwargs, ( | ||
| "'commit' is not a valid keyword argument to the 'save()' method. " | ||
| "If you need to access data before committing to the database then " | ||
| "inspect 'serializer.validated_data' instead. " | ||
| "You can also pass additional keyword arguments to 'save()' if you " | ||
| "need to set extra attributes on the saved model instance. " | ||
| "For example: 'serializer.save(owner=request.user)'.'" | ||
| ) | ||
|
|
||
| validated_data = [ | ||
| {**attrs, **kwargs} for attrs in self.validated_data | ||
| ] | ||
| assert hasattr(self, 'validated_data'), "Call `.is_valid()` before `.save()`." | ||
| validated_data = [{**item, **kwargs} for item in self.validated_data] | ||
|
|
||
| if self.instance is not None: | ||
| self.instance = self.update(self.instance, validated_data) | ||
| assert self.instance is not None, ( | ||
| '`update()` did not return an object instance.' | ||
| ) | ||
| else: | ||
| self.instance = self.create(validated_data) | ||
| assert self.instance is not None, ( | ||
| '`create()` did not return an object instance.' | ||
| ) | ||
|
|
||
| return self.instance |
There was a problem hiding this comment.
The ListSerializer.save() method overrides BaseSerializer.save() but removes several important assertions that exist in the parent class: the 'commit' keyword check, the check preventing save() after accessing data, and the checks for errors. This means users could potentially call save(commit=False) on a ListSerializer or save after accessing .data without getting the helpful error messages from BaseSerializer. Consider re-adding these checks or calling super().save() if possible to maintain consistency with the parent class behavior.
tests/test_serializer_lists.py
Outdated
| def test_many_true_instance_level_validation_guidance(): | ||
| class Obj: | ||
| def __init__(self, valid): | ||
| self.valid = valid | ||
|
|
||
| class TestSerializer(serializers.Serializer): | ||
| status = serializers.CharField() | ||
|
|
||
| def validate_status(self, value): | ||
| if self.instance is None: | ||
| # Provide guidance if user tries to use instance-level validation with many=True | ||
| raise serializers.ValidationError( | ||
| "You tried to access self.instance in a many=True update, " | ||
| "but it is not set by default. Override run_child_validation " | ||
| "to set the individual instance." | ||
| ) | ||
| if not self.instance.valid: | ||
| raise serializers.ValidationError("Invalid instance") | ||
| return value | ||
|
|
||
| objs = [Obj(True), Obj(False)] | ||
|
|
||
| serializer = TestSerializer( | ||
| instance=objs, | ||
| data=[{"status": "ok"}, {"status": "fail"}], | ||
| many=True, | ||
| partial=True, | ||
| ) | ||
|
|
||
| with pytest.raises(serializers.ValidationError) as exc: | ||
| serializer.is_valid(raise_exception=True) | ||
|
|
||
| assert "run_child_validation" in str(exc.value) |
There was a problem hiding this comment.
The test test_many_true_instance_level_validation_guidance creates objects without 'id' or 'pk' attributes, so the automated instance matching won't occur and self.instance will remain None. This means the test correctly validates that when instances don't have id/pk attributes, the validation guidance message is raised. However, the test name and intent suggest it's demonstrating a pattern that users should follow, when actually the PR introduces automated matching that eliminates the need for this pattern in most cases. Consider renaming this test or adding a comment to clarify it's testing the edge case where objects don't have id/pk attributes.
| else: | ||
| ret.append(validated) | ||
| errors.append({}) | ||
| self._instance_map = instance_map |
There was a problem hiding this comment.
The _instance_map is stored as an instance attribute during validation. If the same ListSerializer instance is used concurrently (e.g., in multiple threads), this could lead to race conditions where one thread's instance map overwrites another's. While Django REST Framework serializers are typically instantiated per-request, this could be an issue in certain deployment scenarios. Consider documenting that ListSerializer instances should not be shared across threads during validation, or adding thread-safety mechanisms if concurrent usage is expected.
There was a problem hiding this comment.
can you please cross check this?
| if self.instance is not None: | ||
| self.instance = self.update(self.instance, validated_data) |
There was a problem hiding this comment.
The assertion assert self.instance is not None, ('update() did not return an object instance.') was removed from the save method. This removes an important safety check that ensures update() returns a valid instance. Without this check, if update() returns None, self.instance will be set to None without any warning, which could lead to silent failures. Consider keeping this assertion for defensive programming.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
auvipy
left a comment
There was a problem hiding this comment.
can you please cross check the suggestions?
…ions, standardize errors
|
Pushed follow-up fixes addressing the review feedback: Standardized not_a_list error output using ErrorDetail. Updated instance iterable checks to use explicit types ( Preserved ListSerializer.save() safety assertions, including Made _instance_map cleanup defensive. Documented duplicate-key behavior in instance mapping (last-write-wins semantics). Validation performed locally: tests/test_serializer_lists.py Issue #2704 regression test Full test suite All tests passing. |
|
Thanks for the review. I’ve now restored the unintended docstring regressions in [serializers.py] What was fixed Scope This change is docstring-only. Validation Ran full test suite locally: all passing. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 3 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Summary
This PR fixes issue #8926 by updating
ListSerializerto preserve and provide access toself.instanceduring validation whenmany=True. Previously, child serializers in bulk updates could not access their corresponding instance, causing AssertionErrors and inconsistent behavior. This update ensures that each item in a list serializer automatically matches its input data to the correct instance usingidorpk.Key Enhancements
Automated Instance Matching
ListSerializer.run_child_validationnow attempts to match input data to items inself.instance.instanceassignments.Validation Fixes
validated_databy returningrun_validationresults directly.instanceandinitial_datain deepcopied child serializers.partial=True) correctly propagate from root serializer to list items.to_internal_valuefor positional list errors.Test Suite Updates
tests/test_serializer_lists.pyto reflect consistent validation and instance matching behavior.test_many_true_regression_8926to confirm thatvalidate_<field>methods can now accessself.instanceduring bulk updates.Verification
pytest tests/test_serializer_lists.py→ all 37 tests passed.allow_empty,min_length,max_length, and nested serializers.Notes
ListSerializer.many=True, particularly for update operations.Related Issues
self.instancewhen validating the serializer usingmany=True#8926